ModDBS-X : A Diversity Based Summarizer for DUC2001

نویسندگان

  • Tadashi Nomoto
  • Yutaka Shinagawa
چکیده

1 Description of the system ModDBS-X is a clustering based single document summarizer. It is an open-domain extractive summarizer, demanding of the input nothing more than the availability of basic IR statistics such as term and document frequency. Therefore it could be adapted for any language and domain without much effort. The system goes through three major states to generate a summary: data preparation, summarization, and post-summarization. An input text is first examined for its conformity to the XML syntax; some portions of it are extracted for use in summarization, which are passed on to the sentence selection step, which in turn builds diverse topical clusters over the input and chooses representative sentences thereof. The selected sentences are then put through a post-summarization process, where parenthetical expressions are identified and removed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Text Summarization

Text summarization is a process for creating a concise version of document(s) preserving its main content. In this paper, to cover all topics and reduce redundancy in summaries, a two-stage sentences selection method for text summarization is proposed. At the first stage, to discover all topics the sentences set is clustered by using k-means method. At the second stage, optimum selection of sen...

متن کامل

An Optimization Model and DPSO-EDA for Document Summarization

We model document summarization as a nonlinear 0-1 programming problem where an objective function is defined as Heronian mean of the objective functions enforcing the coverage and diversity. The proposed model implemented on a multi-document summarization task. Experiments on DUC2001 and DUC2002 datasets showed that the proposed model outperforms the other summarization methods. Index Terms – ...

متن کامل

Enhancing extraction based summarization with outside word space

We present results from improving vector space based extraction summarizers. The summarizer uses Random Indexing and Page Rank to extract those sentences whose importance are ranked highest for a document, based on vector similarity. Originally the summarizer used only word vectors based on the words in the document to be summarized. By using a larger word space model the performance of the sum...

متن کامل

Multi-Document Summarization using Automatic Key-Phrase Extraction

The development of a multi-document summarizer using automatic key-phrase extraction has been described. This summarizer has two main parts; first part is automatic extraction of Key-phrases from the documents and second part is automatic generation of a multidocument summary based on the extracted key-phrases. The CRF based Automatic Keyphrase extraction system has been used here. A document g...

متن کامل

Model-based Story Summary

A story summarizer benefits greatly from a reader model because a reader model enables the story summarizer to focus on delivering useful knowledge in minimal time with minimal effort. Such a summarizer can, in particular, eliminate disconnected story elements, deliver only story elements connected to conceptual content, focus on particular concepts of interest, such as revenge, and make use of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001